Improving prevalence estimation through data fusion: methods and validation
نویسندگان
چکیده
BACKGROUND Estimation of health prevalences is usually performed with a single survey. Some attempts have been made to integrate more than one source of data. We propose here to validate this approach through data fusion. Data Fusion is the process of integrating two sources of data into one combined file. It allows us to take even greater advantage of existing information collected in databases. Here, we use data fusion to improve the estimation of health prevalences for two primary health factors: cardiovascular diseases and diabetes. METHODS We use a real data fusion operation on population health, where the imputation of basic health risk factors is used to enrich a large-scale survey on self-reported health status. We propose choosing the imputation methodology for this problem through a suite of validation statistics that assess the quality of the fused data. The compared imputation techniques have been chosen from among the main imputation methodologies: k-nearest neighbor, probabilistic modeling and regression. We use the 2006 Health Survey of Catalonia, which provides a complete report of the perceived health status. In order to deal with the uncertainty problem, we compare these methodologies under the single and multiple imputation frames. RESULTS A suite of validation statistics allows us to discern the strengths and weaknesses of studied imputation methods. Multiple outperforms single imputation by providing better and much more stable estimates, according to the computed validation statistics. The summarized results indicate that the probabilistic methods preserve the multivariate structure better; sequential regression methods deliver greater accuracy of imputed data; and nearest neighbor methods end up with a more realistic distribution of imputed data. CONCLUSIONS Data fusion allows us to integrate two sources of information in order to take grater advantage of the available data. Multiple imputed sequential regression models have the advantage of grater interpretability and can be used for health policy. Under certain conditions, more accurate estimates of the prevalences can be obtained using fused data (the original data plus the imputed data) than just by using only the observed data.
منابع مشابه
A New Approach to Self-Localization for Mobile Robots Using Sensor Data Fusion
This paper proposes a new approach for calibration of dead reckoning process. Using the well-known UMBmark (University of Michigan Benchmark) is not sufficient for a desirable calibration of dead reckoning. Besides, existing calibration methods usually require explicit measurement of actual motion of the robot. Some recent methods use the smart encoder trailer or long range finder sensors such ...
متن کاملImprovements in prevalence trend fitting and incidence estimation in EPP 2013
OBJECTIVE Describe modifications to the latest version of the Joint United Nations Programme on AIDS (UNAIDS) Estimation and Projection Package component of Spectrum (EPP 2013) to improve prevalence fitting and incidence trend estimation in national epidemics and global estimates of HIV burden. METHODS Key changes made under the guidance of the UNAIDS Reference Group on Estimates, Modelling a...
متن کاملSPOT-5 Spectral and Textural Data Fusion for Forest Mean Age and Height Estimation
Precise estimation of the forest structural parameters supports decision makers for sustainable management of the forests. Moreover, timber volume estimation and consequently the economic value of a forest can be derived based on the structural parameter quantization. Mean age and height of the trees are two important parameters for estimating the productivity of the plantations. This research ...
متن کاملA New Method for Multisensor Data Fusion Based on Wavelet Transform in a Chemical Plant
This paper presents a new multi-sensor data fusion method based on the combination of wavelet transform (WT) and extended Kalman filter (EKF). Input data are first filtered by a wavelet transform via Daubechies wavelet “db4” functions and the filtered data are then fused based on variance weights in terms of minimum mean square error. The fused data are finally treated by extended Kalman filter...
متن کاملIntegrated Approaches in Fusion Data Analysis
The concept of integrated data analysis in nuclear fusion requires the linkage of data and physical information. Summarizing the key steps for the analysis of transport in the core plasma, benefits of probabilistic modelling of single diagnostics are discussed. Concepts for full diagnostics models consisting of several diagnostics modules and linkage through mapping procedures are given in figu...
متن کامل